UX improvements: TPM reseal (HOTP/TOTP/DUK) adds integrity report; detects disk/tpm swap and guide user into action, add terminal colors and guidance! Reduced quiet noise.#2068
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves Heads’ TPM reseal UX by adding an integrity “gate” (TOTP/HOTP + /boot verification) and better detection/handling of TPM/disk swap or rollback-counter inconsistencies, plus some QEMU-focused debugging/documentation updates.
Changes:
- Add measured integrity reporting + discrepancy investigation flows, and integrate them into reseal/reset paths in the GUI.
- Improve TPM rollback-counter handling (preflight validation, clearer error guidance, better prompt visibility).
- Replace fdisk-based disk display with a sysfs-based helper and add QEMU troubleshooting/debug tips (including TPM2 pcap capture).
Reviewed changes
Copilot reviewed 8 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| targets/qemu.md | Adds QEMU troubleshooting notes (Canokey state reuse, TPM2 pcap capture). |
| initrd/etc/gui_functions | Adds integrity report + investigation UI helpers; system info now uses disk_info_sysfs. |
| initrd/etc/functions | Adds trace stack, rollback-counter preflight helpers, sysfs disk info helper, and multiple TPM/boot-device related adjustments. |
| initrd/bin/unseal-totp | Improves TPM2 primary-handle error handling and adds nonfatal mode support. |
| initrd/bin/unseal-hotp | Improves TPM2 primary-handle + rollback-state-aware error handling and adds nonfatal mode support. |
| initrd/bin/tpmr | Improves TPM2 counter increment auth handling, counter-create UX, and TPM2 seal/unseal messaging. |
| initrd/bin/seal-totp | Adds TPM2 primary-handle precheck + clearer sealing failure guidance. |
| initrd/bin/root-hashes-gui.sh | Improves tracing/debugging and adds more flexible LVM LV selection/cleanup. |
| initrd/bin/oem-system-info-xx30 | Switches disk listing to disk_info_sysfs to avoid fdisk/busybox limitations. |
| initrd/bin/oem-factory-reset | Adjusts TPM counter increment handling and removes duplicated integrity report implementation. |
| initrd/bin/kexec-sign-config | Changes TPM counter increment handling and adds a pre-check for empty GPG keyring; modifies signing pipeline. |
| initrd/bin/kexec-select-boot | Hard-fails on TPM2 primary handle hash mismatch with a stronger warning. |
| initrd/bin/kexec-seal-key | Tweaks passphrase prompts/formatting for improved UX. |
| initrd/bin/gui-init | Adds integrity gate + rollback-counter preflight UX and integrates investigation/report flows. |
| boards/qemu-coreboot-fbwhiptail-tpm2/qemu-coreboot-fbwhiptail-tpm2.config | Documents TPM2 pcap capture option in board config. |
| boards/qemu-coreboot-fbwhiptail-tpm2-prod_quiet/qemu-coreboot-fbwhiptail-tpm2-prod_quiet.config | Adds a new “prod_quiet” QEMU TPM2 board config. |
| boards/qemu-coreboot-fbwhiptail-tpm2-hotp-prod_quiet/qemu-coreboot-fbwhiptail-tpm2-hotp-prod_quiet.config | Adjusts board name and minor formatting. |
| boards/qemu-coreboot-fbwhiptail-tpm1-prod_quiet/qemu-coreboot-fbwhiptail-tpm1-prod_quiet.config | Adds a new “prod_quiet” QEMU TPM1 board config. |
| boards/qemu-coreboot-fbwhiptail-tpm1-hotp-prod_quiet/qemu-coreboot-fbwhiptail-tpm1-hotp-prod_quiet.config | Adjusts board name. |
| .gitignore | Ignores *.asc files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3f855b8 to
3f2fe25
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 20 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3f2fe25 to
a1e063a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 19 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
boards/qemu-coreboot-fbwhiptail-tpm1-prod_quiet/qemu-coreboot-fbwhiptail-tpm1-prod_quiet.config
Outdated
Show resolved
Hide resolved
b905930 to
8be0849
Compare
8be0849 to
5b6ab4f
Compare
Add STATUS/STATUS_OK around the extraction loop so the user always sees when cbfs-init starts and finishes. Demote per-file output to DEBUG and update the STATUS text to describe what is being extracted. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
When seal-hotpkey fails mid-way (connection error, dongle removed), the HOTP slot on the dongle is left unconfigured. On the next boot, hotp_verification check returns exit code 6 (EXIT_SLOT_NOT_PROGRAMMED) which was unhandled, falling into the generic transient-error retry loop and leaving the user with no actionable guidance. - Add exit code 6 case in update_hotp() retry loop: break immediately (retrying cannot configure an unconfigured slot), set HOTP status to "HOTP slot not configured" and BG_COLOR_MAIN_MENU="warning". - Add a whiptail dialog for the slot-not-configured case that explains the likely cause and offers "Generate new TOTP/HOTP secret" or recovery shell as next steps. - Export HOTPKEY_BRANDING after it is set in gui-init and seal-hotpkey so all child processes inherit the value without re-reading /boot/kexec_hotp_key. Re-export after the VID-based override in seal-hotpkey so the correct branding propagates. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…nly on first setup /boot/kexec_hotp_key is written at the end of a successful seal and already holds the correct branding string. The VID-based detection block was unconditionally overwriting it on every run, discarding the stored value and always falling back to the generic "Nitrokey" label. Only run VID detection when the file does not yet exist (first-time OEM setup). On all subsequent seals the stored content is used as-is, which preserves any more specific branding set by the previous run. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…loop hotp_verification check does not consume a PIN retry - it verifies an HOTP code, not a PIN. Showing "PIN retries remaining" in the transient error retry path was misleading (implied a PIN was consumed) and caused the counter to be displayed twice when two consecutive transient failures occurred (USB glitch, NK3 connection error). Remove the re-query of hotp_verification info and the PIN retries STATUS from the retry handler; the WARN about the failed attempt is sufficient. The unused hotp_pin_retries and prompt_label locals are also removed. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Two related bugs caused the GPG User PIN retry counter to appear stuck: 1. gpg_auth() (functions): confirm_gpg_card (which shows the current PIN counter) ran only in a pre-loop before the signing loop. The 3-attempt signing loop never re-queried the counter, so after a bad PIN the user saw "GPG authentication failed, please try again" with no updated count. Fix: move confirm_gpg_card inside the signing loop so it runs before each attempt, showing the decremented count after each wrong PIN. Use "until (confirm_gpg_card); do true; done" to preserve the existing card-presence retry behaviour within each signing attempt. 2. kexec-sign-config: confirm_gpg_card is correctly at the top of the for-tries loop, but bad PIN immediately called DIE, preventing tries 2 and 3 (with their updated count display) from being reached. Fix: on bad PIN with tries < 3, WARN and continue so the next loop iteration calls confirm_gpg_card again and shows the decremented count. On tries == 3, DIE with the full remediation message as before. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Show "Attempt N/3" on the first prompt as well, not only on retries. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Move minimum firmware version constants out of functions into a dedicated etc/dongle-versions file. Add a warning when the dongle firmware predates NK3 and requires external reprogramming rather than an in-system upgrade. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ove atomically on success Both scripts write to a staging directory under /tmp rather than directly to the destination, then move files into place atomically on success. kexec-save-default also includes the staging path in DEBUG messages. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
GPG signature verification (check_config, detached_kexec_signature_valid, root-hashes-gui.sh, oem-factory-reset) was broken after the mktemp/atomic staging changes: sha256sum embedded absolute staging-dir paths (/tmp/kexec-sign-XXXXXX/kexec_hashes.txt) into the signed data while verification re-ran sha256sum with /boot/kexec_hashes.txt paths, producing a guaranteed BAD signature on every boot after TPM reset or re-sign. Fix: all signing and verification now cd into the target directory and use relative filenames, so the sha256sum output is path-independent and matches across sign→move→verify. Same pattern applied uniformly to all five call sites. TPM DUK sealing (kexec-seal-key) hardened: - DRK passphrase is now tested against ALL selected devices before accepting; partial success (some devices unlockable) is reported to the user with an explicit confirmation prompt; only the unlockable subset proceeds. - kexec_key_devices.txt is rewritten to the unlockable subset so boot-time unlock is not attempted against devices that never received a DUK. - Hard guard at luksKillSlot: DIE if the slot to wipe equals drk_key_slot, regardless of how wipe_desired was set — prevents DRK destruction. - find_drk_key_slot() now takes dev and keyslots as explicit arguments (was implicitly inheriting outer-scope variables). - mapfile used instead of word-splitting subshell for luks_used_keyslots. - All unquoted variables and [ p -o q ] patterns fixed throughout. LUKS device/LVM selection (kexec-save-default): - 'all' keyword accepted in device/LVM selection prompts; expands to all discovered devices. Empty input no longer silently accepted as valid. - Prompt text updated to make 'all' discoverable. TPM rollback preflight warning (gui-init): - When the TPM counter cannot be read, the dialog now explicitly warns that /boot must be treated as UNTRUSTED if the condition was not intentional, matching the severity language of the integrity report. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
All $paramsdir/$paramsdev references now quoted to prevent word-splitting. Added comment explaining that kexec-seal-key may rewrite kexec_key_devices.txt to the unlockable DRK subset before kexec-sign-config runs, so the signed config always reflects only devices that actually received a DUK. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ssphrase handling kexec-seal-key: defer all paramsdir writes into one rw mount window at the end of the script. Previously the kexec_key_devices.txt cp happened early with no remount guard, so when reseal_tpm_disk_decryption_key called kexec-seal-key directly (not via kexec-save-key) the write failed with EROFS because /boot was still mounted ro. kexec_lukshdr_hash.txt was already guarded; now both writes share one mount -o rw,remount / cp -f / mount -o ro,remount block. Add cp -f to both writes for consistency. luks-functions: - luks_reencrypt: remove redundant passphrase re-read block (dead code since test_luks_current_disk_recovery_key_passphrase already sets and exports the variable); replace seq 0 31 brute-force keyslot scan with luksDump-based enumeration of only the enabled slots (matches the approach in kexec-seal-key). - luks_change_passphrase: move new passphrase prompt before the per-container loop (was inside the elif on first iteration, confusing); write temp files once before the loop instead of per-container. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
… visible feedback Boot log timing showed multi-second gaps where the user had no output. Add STATUS/STATUS_OK around the HOTP token presence check in gate_reseal_with_integrity_report (~3s gap), before wait_for_gpg_card in report_integrity_measurements (~1s gap), and before the TPM rollback counter read in reseal_tpm_disk_decryption_key. Standardize on "boot hashes" in update_checksums and related messages, consistent with kexec-select-boot's existing wording. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…sh after reseal When HOTP fails (slot not configured or invalid code), show the live TOTP ticker via show_totp_until_esc so the user can compare against their phone before deciding to renew. If TOTP matches, only HOTP needs renewal; if TOTP also mismatches, TPM tampering is more likely. After generate_totp_hotp succeeds, inline re-verify the newly sealed HOTP secret so the display reflects the new state immediately. Without this, the HOTP result remained at the pre-renewal error string until the user pressed 'r' (manual refresh) or rebooted. Also call update_totp && update_hotp after reseal_tpm_disk_decryption_key in show_tpm_totp_hotp_options_menu case 'g', covering the path where no LUKS devices are present and no reboot occurs. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ng variable When detect_boot_device fails, mount_boot now shows a targeted message depending on whether LUKS partitions were found. If LUKS is present the OS was likely installed without a separate /boot; if not, no OS was found at all. Both cases explain that a separate unencrypted /boot is required and that DVD/live ISOs with legacy boot detection produce the correct partition layout. USB boot is offered as the primary option in both cases. Reuse the LUKS_PARTITION_DETECTED flag set by mount_possible_boot_device inside detect_boot_device rather than re-scanning disks in mount_boot. Replace hardcoded "Heads" with $CONFIG_BRAND_NAME in two user-visible messages: the TPM ownerwrite-only rollback preflight error in functions, and the integrity investigation recovery shell guidance in gui_functions. Also fix shellcheck SC2181 in mount_boot: use direct command check instead of testing $? separately. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…review Add missing STATUS_OK after preflight_rollback_counter_before_reseal and after the update_checksums loop in reseal_tpm_disk_decryption_key. Change NOTE to STATUS for the reseal announcement since kexec-seal-key is an internal operation, not a hand-off to an external tool. Add STATUS_OK "Boot hashes signed successfully" in kexec-sign-config before exit 0 so the success path always produces visible feedback. Quote all unquoted variables in scan_boot_options ($option_file, $bootdir) and fix SC2181 in mount_boot (replace [[ $? -eq 0 ]] && continue with a proper if/fi block, and quote $CONFIG_BOOT_DEV). Signed-off-by: Thierry Laurion <insurgo@riseup.net>
kexec-sign-config was changed to sign using relative filenames from a staging directory, but the previous version signed with full paths (sha256sum /boot/kexec*.txt). A firmware upgrade must not invalidate an existing valid /boot/kexec.sig. Try relative-path verification first (matches new signing format); if that fails, retry with full paths so signatures created by the previous kexec-sign-config are still accepted. Both paths populate /tmp/kexec on success so nothing downstream changes. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
The manual gpgv command in the recovery shell only covered the new relative-path format. Match the backwards-compat logic added to detached_kexec_signature_valid: try relative paths first, fall back to full paths for signatures created before the staging-dir change. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Cache USB module load state in enable_usb() so repeated calls across the boot sequence (integrity report, GPG card check, etc.) skip the insmod wrapper spawns after the first load. Fix UUOC (cat | cut) and unquoted echo in kexec-sign-config staging hash update; remove spurious TRACE_FUNC at end of subshell. Simplify detached_kexec_signature_valid: store full paths once and derive relative names via ##*/ expansion instead of collecting basenames then reconstructing full paths for the legacy fallback. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ack test-sign Introduce cache_gpg_signing_pin() implementing both key paths: - Smartcard (User PIN): show PIN retry counters, collect PIN via Heads INPUT, test detach-sign with --pinentry-mode=loopback --passphrase-file, verify, write to /tmp/secret/gpg_pin (mode 600), STATUS_OK on success, retry on bad PIN - Backup key (Admin PIN): unchanged loopback import+test-sign flow; already caches confirm_gpg_card() becomes a thin wrapper around cache_gpg_signing_pin(). [ -s /tmp/secret/gpg_pin ] early-return on second call (cache already primed). gpg-agent.conf: switch pinentry-program to pinentry-tty, add allow-loopback-pinentry so --pinentry-mode=loopback works for smartcard operations. All signing callers use --passphrase-file /tmp/secret/gpg_pin via loopback; pinentry is never called during signing. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
128-byte is technically correct but opaque to users. 128-character conveys the same information in terms they can relate to password strength. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
hotpkey_fw_display is called from multiple code paths (update_hotp, report_integrity_measurements, seal-hotpkey) and was showing the firmware version each time. Guard with /tmp/hotpkey_fw_shown so the NOTE is emitted only on the first call; subsequent calls return early. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…g; remove dirmngr handler Update kexec-sign-config signing command to --pinentry-mode=loopback --passphrase-file /tmp/secret/gpg_pin now that cache_gpg_signing_pin() pre-populates the cache before signing begins. Remove --disable-dirmngr and --no-auto-key-retrieve and the dirmngr error handler: gpg2 is built with --disable-dirmngr (see modules/gpg2). Bad-PIN handler clears the cache so the next confirm_gpg_card call re-prompts. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ngth ux-patterns.md: add GPG User PIN caching section covering pinentry-tty-cache (smartcard Assuan interception path), assert_signable (backup key path), SETERROR cache invalidation, and the STATUS_OK-on-console safety property. Add once-per-session display pattern (hotpkey_fw_display / /tmp flag file). security-model.md: note PIN caching in the signing section; update DUK description to 128 characters / 1024 bits of entropy. tpm.md: add DUK key strength note (128-character, 1024-bit) in the sealing policy section. logging.md: note that STATUS/STATUS_OK are safe to call from stdout-protocol scripts (Assuan etc.) because all log output goes to /dev/console. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
DIE() calls exit 1. Since /init was exec'ing cttyhack+gui-init, any
DIE call inside gui-init or any script it called would exit PID 1,
causing a kernel panic ("Attempted to kill init!").
Replace `exec cttyhack "$CONFIG_BOOTSCRIPT"` with a while-true respawn
loop that runs cttyhack without exec. /init remains PID 1; if the boot
script exits for any reason (DIE, unhandled error, explicit exit) it is
restarted after a 2-second pause with a WARN on the console.
The normal success path (kexec into the OS) is unaffected — kexec
replaces the running kernel and /init never returns in that case.
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…hrase The rollback preflight failure dialog was grepping the diagnostic message and replacing it with a generic summary, discarding the counter ID and specific failure condition the user needs to understand what happened. Show preflight_error_msg directly (stripped of the "Reset TPM from GUI..." action guidance that the menu already provides). Users now see, e.g., "TPM rollback counter 'abcd1234' cannot be read." instead of "Stored TPM rollback metadata cannot be read." doc/tpm.md: add rollback preflight failure UX table and loop description. doc/ux-patterns.md: add rule against paraphrasing internal diagnostics. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…cator Use STATUS_OK (bold green) when firmware meets the minimum, NOTE with inline yellow when an upgrade is available, and NOTE with inline red when the device is below the reprogram threshold and cannot be upgraded via software. Previously: all three states used NOTE with an embedded fw_color variable, and the critical (below-reprogram) case emitted a separate WARN before the NOTE, producing two messages for the same device. Now: one message per device, color determined by severity. The critical flag is checked inside the Librem Key early-return path so that Librem Key firmware <= HOTPKEY_REPROGRAM_BELOW is also shown in red. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
fail_unseal() was defined identically in both unseal-hotp and unseal-totp, differing only in the debug message string. Move it to /etc/functions using basename \$0 so the message still identifies the calling script. detect_heads_tty() was copy-pasted between gui-init and gui-init-basic. Move it to /etc/functions and replace both inline blocks with a single call. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
6a1c15a to
b698631
Compare
|
Will merge in the next days :) |
Improve TPM/TOTP/HOTP recovery and reseal behavior by adding integrity-first
gating, clearer failure handling, and stronger rollback preflight checks.
before reseal/reset paths
fail early on inconsistent TPM state
clearer reset/reseal guidance, better TPM1/TPM2 handling)
actionable GPG error diagnostics
debug wrappers around sensitive interactive commands
metadata in sync
Tested : simulating or real firmware upgrade from master to this PR CI created rom artifacts 03/11/2026
Workflow change
CC @wessel-novacustom comments?
There were reports of Heads not providing integrity checks prior of resealing TOTP/HOTP, so that user is confident about the state of /boot prior of resealing TOTP/HOTP/DUK which would resign /boot content.
Normal workflow after upgrading firmware while /boot unchanged
Normal non-hotp boot workflow requesting TPM DUK
Other corners cases
TPM reset from OS?
Similar to above, but pushes for TPM Reset since TPM reseal won't work


Replaced gpg key, mismatch from USB Security dongle etc
This is where testing of corner cases is lacking (too much time involved here already)